Novelty Detection via Topic Modeling in Research Articles
نویسندگان
چکیده
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty detection is the identification of new or unknown data or signal that a machine learning system is not aware of during training. The problem becomes more intense when it comes to “Research Articles”. A method of identifying novelty at each sections of the article is highly required for determining the novel idea proposed in the research paper. Since research articles are semistructured, detecting novelty of information from them requires more accurate systems. Topic model provides a useful means to process them and provides a simple way to analyze them. This work compares the most predominantly used topic modelLatent Dirichlet Allocation with the hierarchical Pachinko Allocation Model. The results obtained are promising towards hierarchical Pachinko Allocation Model when used for document retrieval.
منابع مشابه
Method for Novelty Recommendation Using Topic Modelling
Content-based filtering methods fall short in situations where there are many similar items to recommend from, for instance when recommending articles from multiple news portals. To deal with this problem, we can consider the novelty of recommendations. Detecting novelty is usually implemented as finding the most dissimilar articles. We propose a method that uses topic modelling to find the nov...
متن کاملModeling Topic-Level Academic Influence in Scientific Literatures
Scientific articles are not born equal. Some generate an entire discipline while others make relatively fewer contributions. When reviewing scientific literatures, it would be useful to identify those important articles and understand how they influence others. In this paper, we introduce J-Index, a quantitative metric modeling topic-level academic influence. J-Index is calculated based on the ...
متن کاملTAP-DLND 1.0 : A Corpus for Document Level Novelty Detection
Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic nov...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملSpotting Rumors via Novelty Detection
Rumour detection is hard because the most accurate systems operate retrospectively, only recognising rumours once they have collected repeated signals. By then the rumours might have already spread and caused harm. We introduce a new category of features based on novelty, tailored to detect rumours early on. To compensate for the absence of repeated signals, we make use of news wire as an addit...
متن کامل